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Abstract 

It is shown how to resolve the apparent contradiction between the macroscopic 
approach of phase space and the vahdity of the uncertainty relations. The main 
notions of statistical mechanics are re-interpreted in a quantum-mechanical way, 
the ergodic theorem and the iJ-theorem are formulated and proven (without "as- 
sumptions of disorder"), followed by a discussion of the physical meaning of the 
mathematical conditions characterizing their domain of validity. 

Introduction 
0.1 

The object of the present paper is the clarification of the relations between the macro- 
scopic and the microscopic point of view of complex systems; that is, the discussion of 
the question why the known thermodjTiamic methods of statistical mechanics make it 
possible to make statements about incompletely (viz., only macroscopically) known sys- 
tems that are correct most of the time. In particular, first, how the peculiar, seemingly 
irreversible behavior of entropy arises, and second, why the statistical properties of the 

* Translation of: Beweis des Ergodensatzes und des _ff-Theorems in der neuen Mechanik. Zeitschrift 
fiir Physik 57: 30-70 (1929). Translated in 2009. Additions in the text by the translator are put in 
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the original containing only citations have been moved to the main text. In the original, equations 
and references are not numbered. The notation agrees essentially with the original, with the following 
exceptions: /i/27r has been replaced with h; the notation [a, 6] for intervals has been introduced to 
simplify some sentences. In a few cases, misprints and other mistakes in formulas have been identified 
by the translator, corrected in the text, and mentioned in a footnote. The translator is grateful to Wolf 
Beiglbock for suggesting improvements and librarian Mei Ling Lo of Rutgers University for help with 
the bibliography. 

^Department of Mathematics, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854- 
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(fictitious) micro-canonical ensemble can be attributed to the incompletely known (real) 
system]^ And these questions shall be attacked with the means of quantum mechanics. 

In classical mechanics, it is known that these questions have led to the development 
of two elaborate theoretical systems: the statistical mechanics of Boltzmann and that 
of Gibbs. The former could not provide a final and satisfactory solution because it had 
to make essential use of so-called assumptions of disorder — and exactly to fathom the 
nature of this "disorder" is the real problemU The latter would basically be adequate 
for this program; however, it leads to a mathematical problem — the so-called quasi- 
ergodic problem — that has been and still is absolutely insurmountable. Only if the 
corresponding mathematical conjecture is valid, the Gibbsian theory succeeds. 

In general questions of principle, however, the new quantum mechanics differs from 
the classical mechanics by being remarkably simplejl it is due to this circumstance that 
in quantum mechanics, if we follow the Gibbsian path, we can reach the goal with 
relatively simple mathematical means. That is, it will be possible in what follows to 
prove the ergodic theorem and the if-theorem (which are the two questions mentioned 
above) without the need to recur to any assumption of disorder. But before speaking 
of them in more detail, we need to say more about the notion of the macroscopic in 
quantum mechanics. 



0.2 

The main difficulty with re-constructing the Gibbsian theory in quantum mechanics 
is that the tool of "phase space" — i.e., for a system of / degrees of freedom, the 
2/-dimensional space described by the / coordinates qi,...,qf and the / momenta 
Pi,...,Pf — cannot be dispensed with: all of the important notions (energy surface, 
phase cells, micro-canonical and canonical ensembles, etc.) are based on it. But the 
phase space cannot be formed in quantum mechanics, since a coordinate qk and the cor- 
responding momentum are never simultaneously measurable; instead, their probable 
errors (spreads) Aqj. and Ap^ are always related according to the uncertainty relation 
Moreover, it is impossible to specify, for a state of the system, two 
intervals /, J so that, with certainty, lies in / and pk in J (even if the product of 
their lengths is much bigger than ^/2)|3 — thus, not only the continuous phase space but 

^We are thinking of closed and isolated systems. For a system in contact with a large heat reservoir 
it is known that the so-called canonical ensemble is appropriate. However, this case can easily be 
reduced, with the methods of statistical mechanics, to the former, by including the heat reservoir into 
the system. 

^For a critical discussion of this matter (also concerning our subsequent remarks) see [SJH]. 
^For many special problems it is, of course, the other way around. 
^See [9] and [1]. Concerning the limit h/2 see, e.g., [23", p. 272]. 

^That is, if the wave function (p{qi, . . . , qf) vanishes for all values of outside a finite interval / 
then, expanding 

/oo poo 
■■■ c{p^,...,pf)ei^P^'^^+-+Pf'^f^dpi---dpf, 
OO J — OO 

the Fourier coefficients c{pi^ . . . ^pf) must again and again become ^ for arbitrarily large pk- 
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also a discrete partition thereof into cells is meaningless! Still, it is obviously factually 
correct that in macroscopic measurements the coordinates and momenta are measured 
simultaneously — indeed, the idea is that that becomes possible through the inaccuracy 
of the macroscopic measurement, which is so great that we need not fear a conflict with 
the uncertainty relations. How are these two statements, contradicting each other, to 
be reconciled? 

We believe that the following interpretation is the correct one: In a macroscopic 
measurement of coordinate and momentum (or two other quantities that cannot be 
measured simultaneously according to quantum mechanics), really two physical quanti- 
ties are measured simultaneously and exactly, which however are not exactly coordinate 
and momentum. They are, for example, the orientations of two pointers or the loca- 
tions of two spots on photographic plateqj — and nothing keeps us from measuring these 
simultaneously and with arbitrary accuracy, only their relation to the really interesting 
physical quantities (g^ and pk) is somewhat loose, namely the uncertainty of this coupling 
required by the laws of nature corresponds to the uncertainty relation (cf. Footnote H]) . 

Formulated mathematically, quantum mechanics attributes to the quantities qk and 
Pk the well-known operators = Qk - ■ ■ and = ■ ■ ■ pTO], whose lack of com- 
mutability (QfcPfc PfcQfe, the difference is, as is well known, 4l) corresponds to the 
lack of simultaneous measurability of these quantities [3l |9]. We now assume that two 
other, commuting, operators Q),, P), exist whose difference from (respectively, P^) 
is so small that its size is characterized by numbers AQk and APk whose product does 
not significantly exceed the value h/2 required by the uncertainty relation. (Of course, 
it cannot be less than that because of Q^Pfc — PfcQfc = ^1, QfcPfc ~ PfcQfc = 0!) A 
somewhat different formulation that achieves (as one easily sees) the same arises from 
the following consideration: The commuting operators Q';., P'^ must possess a complete 
orthogonal system of common eigenfunctionsjH denoted Lpi,Lp2, . . .. Thereof we have to 
require that in every state (pn the spreads of and P^ are less than AQk and APk 
(where AQk APk ~ ^2). Then a simultaneous measurement of Q), and P),, which must 
lead to a state does indeed provide simultaneous information about and P^. By 
the way, it suffices to select the orthogonal system (pi,(p2, ... as described above, then 
Q'f. and P'f. can then easily be chosen — after all, it suffices to specify their respective 
eigenvalues in the states ipn {n = 1,2, . . .), which it is advantageous to take to be the 
expectation values of and P^ in the state 

^For example, one may think of the coordinate and momentum of a particle in the sense of the 
citations of Footnote H] as measured in the following way: On the one hand (coordinate), let the particle 
be illuminated by a bundle of light focussed on it approximate position, on the other hand (momentum) 
by a quite monochromatic and plane wave bundle of light, with the reflected light photographed after 
passing a prism in order to determine the wave length. Of course, the inaccuracies must satisfy the 
uncertainty relation. In this way one obtains, on two photographic plates, two spots determining 
coordinate and momentum with said inaccuracy. 

^For the sake of simplicity, we assume that the actually measured quantities QJ,, have pure point 
spectra, which should be the case if the available volume is finite. The existence of a system of common 
eigenfunctions can be proved in the same way as for usual (finite dimensional) matrices [TJ [TU] . 

^That is, qk \ipn{qi ■ ■ ■ g/)P dqi ■ ■ ■ dqj and f ip'^^ {qi . . . qf) (p*{qi ...qf)dqi--- dqj. 
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This plausible assumption can be confirmed mathematically: For any two positive 
numbers £, 7] with erj = Ch/2 (where C is a constant, see FootnotelH]), there is a complete 
orthogonal system ipi,{p2,... such that in every state ipn the spreads of and Pk 
are smaller than e (respectively, 77) To specify the and to prove their properties 
requires somewhat cumbersome calculations^ which we do not reproduce here since 
the important aspects should be sufficiently clear from the above description. 

Thus, we make the assumption about the nature of macroscopic measurements that 
simultaneously measurable quantities (with pairwise commuting operators) are being 
measured, which are coupled to the primitive and not simultaneously measurable phys- 
ical quantities (coordinates, momenta, etc.) just so accurately as allowed by the un- 
certainty relations. How to carry this out in detail will be shown in the course of this 
paper. 

0.3 

About the formalism of quantum mechanics in general we say the following. The states 
of a system are known to be characterized by the so-called wave functions, complex func- 
tions ip = ip{qi, . . . ,qf) defined on the "configuration space", the /-dimensional space 
described by the / coordinates qu - ■ ■ ,qf- The physical quantities are characterized by 
the Hermitian operators A, B, . . .O The most important operations with wave functions 
are: the "inner product" 



(y?, ^) = y ■J ip{qi, ...,qf) i){qi, . . . , qj)* dqi ■ ■ ■ dqj (1) 
(where * denotes the complex conjugate) and the "norm"0 



m 



\f{^PM = J J "J 1*^(^1' •••'?/) P ■■ ■ dqj ■ (2) 



The simplest description of a state by means of a wave function (f is obtained in this 
way: the expectation value of the quantity A in the state ip is equal to {/K(p,(p). The 



^One sees that C ~ 1 would be the ideal estimate (which exploits all possibilities left by the 
uncertainty relation). The author succeeded only in computing C < 3.6 [Note of the translator: 3 years 
later in his book, von Neumann repeated this claim with C ~ 60, so maybe the bound C < 3.6 was 
incorrectly calculated; see also Section 2.2 of the commentary], but since the value of h/2 in macroscopic 
(centimeter-gram-second) units is approximately 10~^^, the difference does not really matter. 

^°One should use the wave packets used by Heisenberg 9 , exp{—^^q'^ + {^^ + jrb)q) — where we write 
q for qk and ignore the other qi, qf , so that Q = g • • • and P = f ^ • • • have the means a respectively 
b and the spread squares respectively (jg^')^ — with a — \/^K/Cei, b = ^/inJCrij — ^/CTT{h/e)j, 
Q = e/VC, where i,j ~ 0, ±1, ±2, . . .. The functions thus defined should be written in arbitrary order 
as a sequence and then orthogonalized according to the procedure of E. Schmidt [15 . This yields the 
desired ipx,Lp2, 

-'^-'^In the following, the terminology and notation follows that of 19 . Everything needed for the 
present purposes will be collected presently. 

-•^^The calculus with these is outlined, e.g., in [18^ . 
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specification of all expectation values provides, as it includes the expectation values of 
all powers (i.e., the so-called higher moments of a probability distribution), knowledge 
of the entire probability distribution of every quantity — and thus a complete statistical 
characterization of the system [Sj [19] . 

We also need the statistics of quantities in the system in case that, instead of a single 
states we encounter a mixture of several states ipi, ^2-, ■ ■ ■ with respective probabilities 
Wi,W2,---- Then the expectation value of A is, obviously, equal to w„(Av9„, (^„), 
which is advantageously written in a different way. Let us describe, in any complete 
orthogonal system, A by a matrix a^.^ and each by a vector xj] (/i, = 1, 2, . . .) |18j . 
Then 

^m;„(Av9„,V3„) = ^Wn^a^i.x)]*x" = ^a^^J^u;„a;"x)]* , (3) 

SO that, if U is the operator with matrix J2n''^nX]^x^* , this is the trace of AU0 Thus 
the statistical behavior of the above mixture of several states is characterized by the 
operator U, on the basis of the rule: the expectation value of A is equal to tr(AU). 
We call U the statistical operator of the mixture; one sees that U suffices for describing 
the mixture, and it is unnecessary to specify the individual states from which it was 
composed. 

By the way, it is convenient to introduce a symbol for the operator with the 
matrix x^x^,* (where is the vector of the wave function ip). It is easy to verify 
the equivalent definition P^/ = {f,(f)(f (where / is any other wave function). Then, 
U = J2n'^nPipnJ particular, P^ is the statistical operator of the pure state ip. 



0.4 

Now we can approach the (quantum mechanical) formulation of the ergodic theorem. 
We start by discussing two approaches that do not solve the real problem but will, we 
believe, help make the situation clearer and more transparent. 

The classical formulation of the ergodic theorem (more precisely, the quasi-ergodic 
theorem) asserts the following: A system's point in phase space will, in the course of its 
motion (determined by the differential equations of mechanics), come arbitrarily close 
to every point of its energy surface — indeed, the time it spends in any region of the 
latter in the long time average is proportional to the measure of that regiono Thus, in 
a given state the statistical properties of the time ensemble (corresponding to averaging 
every quantity over all times) are identical to those of its micro-canonical ensemble. The 

^■^See [ini 11]. The trace is the sum of the diagonal elements of the matrix; since it is a unitary 
invariant, one can talk of the trace of an operator, without specifying a complete orthogonal system. 

^"^As is well known, the measure to be considered is not the (2/ — l)-dimensional surface area of 
the piece of energy surface but rather the [infinitesimal] 2/-dimensional volume of a strip between 
neighboring energy surfaces, i.e., the integral of the reciprocal [magnitude of the] gradient of the energy 
over the region mentioned. — The essential (and often ignored) difference between the two halves of 
the above formulation of the quasi-ergodic theorem was emphasized by P. and T. Ehrenfest 016]: the 
second half is indispensable for the foundation of the statistical mechanics of Gibbs. 
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latter is the mixture of all points of the energy surface, with region of equal measure (as 
in Footnote [T^ given equal weight. 

Now in quantum mechanics let H be the energy operator, 931,932, ■ ■ ■ its eigenfunc- 
tions0 Wi, W2, ■ ■ ■ the respective eigenvalues. A state 

^ = an^n (4) 

n 

evolves with time t ([be it] > 0, = 0, or < 0) according to the time-dependent Schrodinger 
equation to 

n n 

We first need to scrutinize the concept of energy surface. The |a„(t)p = |a„p re- 
main constant in the course of time, not only the energy expectation value (Hijjt, iIj^) = 
^„ |an(t)pVFri- Since the |a„(t)p characterize the entire statistics of energyj^j we can 
say: The law of energy conservation in classical mechanics, when transferred to quantum 
mechanics, asserts not merely the conservation of the mean energy, but rather the con- 
servation of the whole probability distribution of the energy. If we defined a quantum 
mechanical "energy surface" in the immediate way by 

\an\^Wn = const. (6) 

n 

then the ergodic theorem would be far from valid — after all, there exist infinitely many 
constants of motion |aip, |a2p, . . .. Instead, the "energy surface" should be defined as 

|ai|^ = const. 1 , |a2|^ = const.2 , .... (7) 

We thus arrive at the question: Let 

an = Tne'"'" (r„ > 0, < a„ < 27r), (8) 

so that the energy surface consists of those 

iP' = Y,<fn With a'^ = r^e'< {0<a'^<27r), (9) 

n 

do the 

a„(t) = ^^e^(^"*/^+"") (10) 

come arbitrarily close to all a^, i.e., do the Wnt/h + an come arbitrarily close to the a'^ 
(modulo 27T, of course, and for all n = 1, 2, . . .)? And, how long are the relative sojourn 
times in given intervals of a^? Put differently: Will Wnt/h come arbitrarily close, for 

^^More precisely: a complete orthogonal system formed of eigenfunctions, i.e., a coordinate system 
in which H is diagonal. (We assume that there is no continuous spectrum.) 

^^For example, because they determine, according to (H'^ ipt , 'ipt) = J2n \'^n{t)\'^W^, the expectation 
values of all powers of energy, i.e., all moments of the energy statistics. 
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suitable t and modulo 27r, to any given collection a'^ — a„ (for all n = 1,2, . . .), and 
what are the relative sojourn times? According to theorems of Kronecker, for the former 
behavior the linear independence of Wn/h over the integers is necessary and sufficient, 
i.e., the condition that no relation of the form 

,,Ifl + ... + ,„I^.O (11) 

n n 

(n arbitrarily large but finite; Xi, . . . , x„ integers) obtains, except when xi = . . . = Xn = 
[m [12]. From further theorems of Weyl it follows that in this case also the sojourn 
times are correct, i.e., proportional to the product of the lengths of the intervals |2T] . 
So, in this formulation the [hypothesis of the] ergodic theorem amounts to the absence 
of resonances between the terms Wn/h of the system^ 

However, we have actually asked too much, as the true essence of the ergodic theorem 
that is essential to all applications is, as already mentioned, the agreement between the 
time ensemble and micro-canonical ensemble — and not the question what the system's 
trajectory on the energy surface is. As we know from Section 0.3, to this end only 
agreement between the statistical operators of these two ensembles is needed (while, 
beyond that, their "true" composition from wave functions is undiscoverable) . 

Now ipt has the statistical operator and we need to average this, on the one 
hand, over all t while keeping the a„ fixed (time ensemble), and, on the other hand, for 
t = over all a„ (micro-canonical ensemble, where we now write instead of a'^)- We 
want to write P^^ as a matrix in the coordinate system ipi,ip2, . . .] since 

= $^r„e*('^"*/^+""Vn, (12) 

n 

the m, n component of P^^ equals 



Averaging this over all ae, we obtain for m ^ n and for m = n. To ensure that 
averaging over t yields the same result, we must have that (Wm — Wn)/h 7^ for m 7^ n, 
i.e., Wm 7^ Wn- That is, there must not be degeneracies (a much weaker condition than 
the previous one [i.e., rational-linear independence]!). 

At this point we might think we have proved the ergodic theorem to a satisfactory 
extent. However, we cannot be satisfied with this result since it does not mention the role 
of the macroscopic. Indeed, we have dealt with a completely and exactly known system, 
for which, for example, the energy surface was described by the exact specification of all 
|a„p. Thus, in order to treat the incompletely known systems of statistical mechanics, 
we need to further modify the question we are asking]^ 



^^It may seem strange that the condition involves the Wn/fi and not the {Wm — Wn)/fi, but this is 
due to an imprecision in our consideration. A constant factor (of modulus 1) in the wave function is 
meaningless (e.g., it drops out of the statistical operator P^), and thus we should have required, what we 
asked of the phases Wnt/h+an, only of the phase differences, for example {Wn — Wi)t/h+{an~ai) for 
n = 2, 3, . . .. This leads again to condition above, but now for the eigen frequencies {Wn — Wi)/h, 
n = 2,3, .... 

-"^^ Another hint showing that the theorem just proved cannot be the right ergodic theorem is that 
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0.5 



This modification must consist primarily in re-interpreting the concept of energy surface 
in a macroscopic way, i.e., to extend the micro-canonical ensemble to a collection of 
all those states whose energy statistics cannot macroscopically be distinguished from 
that of the given state. Under such circumstances, also the agreement between time 
and microscopic [i.e., micro-canonical] average should only be required for macroscopic 
quantities. This weakening comes together with an essential strengthening that is made 
possible only by using the macroscopic perspective. Namely, we will show that for every 
state of the system the value of each (macroscopically measurable) quantity not only has 
time mean equal to the micro-canonical mean, but furthermore has small spread, i.e., 
the times at which the value deviates considerably from the mean are very infrequent. 

It is useful to compare this with the corresponding considerations of the classical 
theory. There, the above-mentioned theorem, which amounts to a justification of the 
statistical-mechanical methods, gets decomposed into two steps as follows: First it needs 
to be shown that for every quantity the time statistics coincides with the micro-canonical 
one; then that for so-called macroscopic quantities the micro-canonical statistics has 
small spread. The first claim is just the presently unprovable classical quasi-ergodic 
theorem, the second, in contrast, can easily be proved by means of combinatorial con- 
siderations of counting (see, in particular, O |6]). However, what we want to call the 
ergodic theorem is the above implication of both claims together. 

A more precise discussion will be provided in the course of this paper; here we 
just want to emphasize two points: First, our formulation of the ergodic theorem will 
require that the temporal behavior sketched above actually occurs for every initial state 
of the system (every tp) without exceptions (classically, one would admit exceptions in 
lower-dimensional parts of the energy surface). Second, we emphasize that the true 
state (about which we do calculations) is a wave function, i.e., something microscopic — 
to introduce a macroscopic description of the state would mean to introduce disorder 
assumptions, which is what we definitely want to avoid. Likewise, the energy operator 
occurring in the time-dependent Schrodinger equation 

(whose solution is (EJ) must be represented in its exact (microscopic) form. (Of course, 
this is different from what happens in the definition of the energy surface, as we will 
discuss later.) We will now elucidate the conditions that will turn out necessary for the 
validity of the ergodic theorem. 

0.6 

These conditions come in two groups, first those concerning the (microscopic) energy 
operator H, second those concerning the partition of the (macroscopic) energy surface 



its premise (non-degenerate energy) is too weak: it is still satisfied for a known counterexample to the 
classical ergodic theorem! Cf. Section [ 
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into phase cells, and the size of the latter. (What is meant quantum-mechanically by 
energy surface, phase cells, and other objects in phase space, will be defined precisely; 
at this point it suffice to operate with these terms in the way that was common in 
pre-quantum-mechanical theory. In particular, by phase cells we mean the partition of 
phase space that can be carried out by means of macroscopic measurements.) 

Concerning the energy, we will find that the term differences (i.e., eigen frequen- 
cies) must be distinct, and likewise the terms themselves (non-degenerate!) — i.e., if 
Wi,W2, ■ . . are the energy values then all Wm — Wn (with m ^ n) are distinct and 
likewise all Wn- (Though we might even admit infrequent exceptions!) As one can see, 
this condition lies, with respect to its strength, between the two conditions found in Sec- 
tion |0]1] [i.e., it is weaker than rational-linear independence and stronger than absence of 
degeneracies]. We will convince ourselves in Section that it is a reasonable condition, 
in particular one violated by the classical counterexamples to the ergodic theorem (ideal 
gas without collisions, radiation in a cavity without absorption) and re-instated as valid 
by the known (but only heuristically confirmed) counteractive measures (introduction 
of collisions, absorption and emission). 

About the size of the phase cells we find the following: the number of states ( quantum 
orbits) in each phase cell has to be not only very large, but also on average quite large 
compared to the number of phase cells. We postpone a more detailed interpretation of 
this condition until later and mention here only the following: When we take the limit 
^ ^ (i.e., let quantum mechanics tend to classical mechanics) while not changing the 
macroscopic measuring technique, then the former number grows unboundedly while 
the latter is constant — thus, our condition is satisfied better and better. Its validity is 
thus guaranteed at least if the macroscopic measuring technique is much too coarse to 
reach quantum effects (so that h is practically 0). 

It remains to formulate the if-theorem, which we will prove, too. We can attribute 
in an obvious way an entropy to every state ip., and likewise to its micro-canonical 
ensemblejl^ we can then study the temporal variation of the former and compare it 
to the latter (which is, as one can easily show, always greater than or equal to the 
former). As in classical mechanics, also here a monotonic increase of entropy is out of 
the question, and so is a predominantly positive sign of its [time] derivative (or difference 
quotient): the time reversal objection as well as the recurrence objection are valid in 
quantum mechanics as well as in classical mechanics. Following the discussion of P. and 
T. Ehrenfest of this issue [SIE], we see instead the following as the essential statement of 
the if-theorem: the time average of the entropy of ipt differs only little from the entropy 
of the micro-canonical ensemble — and since the latter is an upper bound of the former, 
we have that the entropy of ipt will rarely be much less than the micro-canonical entropy. 

We will see that the if-theorem holds under the same hypotheses as the ergodic 
theorem. 

To sum up, in quantum mechanics one can prove the ergodic theorem and the H- 
theorem in full rigor and without disorder assumptions; thus, the applicability of the 



^^Cf. the end of Section [TT3l where we wiU also say more about the relation between this entropy to 
that defined by the author in pp] . 
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statistical-mechanical methods to thermodynamics is guaranteed without relying on 
any further hypotheses^ Of course, this is compatible with the fact that also the 
time- dependent Schrodinger equation, on which quantum mechanics is grounded, has 
reversibility and recurrence properties just like the differential equations of classical 
mechanics [1^, and therefore cannot alone explain irreversible phenomenal^ 



0.7 

We would like to sketch the relation between this work and other quantum-mechanical 
investigations on questions of statistical mechanics and thermodynamics. The papers 
of Schrodinger [T7], as well as of L. Nordheim [13], and W. Pauli [\A\ describe the 
macroscopic situation by means of disorder assumptions, and therefore lie in a different 
alley of research. An earlier work of the author is based entirely on the microscopic 
perspective and has the converse goal: To determine the entropy value from assuming 
the validity of the phenomenological second law of thermodynamics. 

The author would like to express his deepest gratitude towards Mr. E. Wigner for 
numerous discussions in which the questions of this article have arisen. 



1 Quantum-Mechanical Formulation of the Concepts 
of the Gibbsian Statistical Mechanics 

1.1 

As we have said and justified in the introduction, we take for granted that all macroscopic 
observations that are possible at all are possible simultaneously. Thus, their operators all 
commute with each other, and so there is a complete orthogonal system ui,U2, ■ ■ ■ of wave 
functions that are eigenfunctions for each of them (cf. Footnote [7j) . Here we expect that 
among the Ui,U2, ■ ■ ■ there are groups of many Un on which every macroscopic operator 
possesses the same eigenvalue, for otherwise carrying out all macroscopically possible 
observations would allow us to distinguish completely between all of the ui,U2, ■ ■ ■ (i.e., 
an absolutely precise determination of the state, which in general is not the case). 
These groups we denote {ui^p, . . . ,ijJsp,p}, p = 1,2,... (replacing the one index n = 
1,2,... with two indices p = 1,2,... and A = 1, . . . , — i.e., the ui^p, . . . , u!sp,p are 
degenerate eigenfunctions for all macroscopic quantitieso Thus, instead of the system 

^°Cf. Schrodinger [17], particularly the last section. Our results allow us to carry out his reasoning in 
a compelling way without his "statistical assumption" (i.e., disorder assumption), and thus to reduce 
it in full rigor to the ordinary statistical interpretation of quantum mechanics. This also answers 
Schrodinger's question whether quantum mechanics also suffers from an "ergodic difficulty." 

^^However, quantum mechanics does know an irreversible elementary process: the measurement. It 
is irreversible (see [2^, where the definition of this process is given in footnote 21 on page 283), but 
whether it is relevant to the irreversibility of reality we leave open. In this work, we do not deal with 
measurement. 

^^A macroscopic quantity is one whose value can exactly be determined by means of macroscopic 
measurements. Thus, if A can assume all values between — oo and +oo, and if it is characteristic of the 
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uJi^p, . . . ,uJsp,p, any other system cj^ ^, . . . ,(^'sp^p obtained from the former by a unitary 
transformation would serve the purpose just as well. 

If all states of a group {ooi^p, ■ ■ ■ , (^sp,p} get mixed with equal weights then one obtains 
a statistical ensemble with the statistical operator 




The operator Ep does not change when the u\^p get replaced with u'^^ p just mentioned, 
as one can easily verify. Every macroscopic operator A has the uj\^p as eigenfunctions, 
and thus is a linear combination of the P^^^ ^ with the eigenvalues as coefficients]^ and 
since all u\^p with the same p have the same eigenvalue, A is even a linear combination 
of the Ep, as we note here for future use. 

By the way, j-Ep is, as can be seen from the way it arises, the statistical operator of 
the ensemble in which all macroscopic quantities have the values corresponding to the 
p-th group (where the Sp quantum states have the same weight) — thus, j-Ep corresponds 
to the p-th one among the alternatives concerning the properties of the system that can 
be distinguished by macroscopic measurements. Therefore it is the equivalent of the 
"phase cells" of the Gibbsian statistical mechanics. The number Sp = tr Ep (tr means 
trace, cf. Footnote [T^ is the number of real (microscopic) states in this cell — its size is 
therefore a measure of the coarseness of the macroscopic perspective. 



1.2 

Let us now consider the energy operator H with the eigenfunctions <^i, ^2, ■ ■ ■ and the 
eigenvalues Wi, W2, . . ., so 

H = ^1^„P^„. (16) 

n 

We emphasize that H is the exact energy and not any macroscopic approximation. 

In general, the are different from the oox^p, and H is not a linear combination of 
the Ep, since the energy is not a macroscopic quantity, as it cannot be measured with 
absolute precision with macroscopic meanso With a certain (reduced) accuracy, how- 
ever, this is indeed possible, so that the energy eigenvalues Wi, W2, ■ ■ ■ can be collected 
in groups {W^i.a, • • • , Ws^^a}, cl = 1, 2, . . . (again we replace the single index in Wn and 
ipn, n = 1, 2, . . ., with two indices, Wp^a and ipp^a with a = 1, 2, . . ., p = 1, . . . , Sa) in such 
a way that all Wp^a with the same a are close to each other and only those with different 

macroscopic inaccuracy that only intervals [fc, fc + 1) (for k = 0, ±1, ±2, . . .) can be distinguished from 
one another, then only /(A) is macroscopically measurable, with / the following function: f{x) = k 
for fc < X < fc + 1 (for k — 0, ±1, ±2, . . .). Cf., however, the discussion in Section [02] and Footnote [Sj 

^■^A Hermitian operator with eigenfunctions xi,X2t ■ ■ and respective eigenvalues wi,W2, ■ ■ ■ must be 
equal to X)n ^"^Xn- ^^^^ [19] . 

^"^For example, think of the situation of observing an ordinary gas. In principle, of course, an energy 
with point spectrum can, under favorable circumstances, be measured with absolute precision: one can, 
e.g., decide whether an oscillator is in the ground state or not. 
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a (i.e., the full groups) can be macroscopically distinguished. How do we formulate the 
fact that we can macroscopically measure the membership of an energy value in a group 

We do this by means of a trick that we have already mentioned and applied several 
times in [19]. Let fa{x) be the function that assumes the value 1 for x = VTi.a, • • • , Ws^,a 
(for fixed a!) and is otherwise 0. Thus, /a(H) is a quantity that has the value 1 when 
the energy value belongs to the aforementioned group, and is otherwise — therefore it 
can be measured macroscopically. From 

H = Y,WnP^^ (17) 

n 

it follows that 

/.(H) = ^/„(iy„)p^„ (18) 

n 

(cf. [IS]), thus 

Sa 

/,(H) = P^P. , (19) 

p=i 

and this must be a linear combination of the Ep. Now the operator ^^Ij^ ^^pa^ ^^"^ 
likewise each Ep = X]a=i P^jap' equal to their own squares, and any two different 
Ep have product (f^kthis implies that in the aforementioned linear combination of the 
Ep each coefficient is equal to its own square, i.e., is either or 1. Thus, Yli^p=i P'Pp.a 
simply the sum of some Ep, let them be called Ei . . . , E^r^ 

Sa Na 

E P^.,^ = E • (22) 
p=l J/=l 

By taking the trace, this implies 

Na 



Sa = J2'^^-- (23) 



Since the product of 



Na Ni, 

E.,a and J2 ^-^b {a ^ b) (24) 



u=l u=l 



^^To prove this, we need to show for two arbitrary but distinct elements ip, ip of an orthogonal system 
that = Pip, P^P^ = 0. Let / be any wave function, then we have that (cf. Section 1(1751 

P' / = ((/, 'pW = if, = if, V)V = P^/ , (20) 

P^P^/ = ((/, ^)V', V)V = (/, V')(^, = . (21) 
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is, according to what we said before, equal to the sum of those Ep appearing in both 
sums, and since, on the other hand, it is also equal to the product of 

5^P^^,„and 5^P^„,, (25) 

p=i p=i 

which vanishes, the sum of the common terms Ep is 0. Therefore there are none, as the 
sum of several Ep, i.e., of several Pui„, never vanishes^ Finally, the Ei^^a exhaust the 
Ep (so far we have seen merely that they re- index a subset in a one-to-one way); to see 
this, it suffices to show that 

oo Na OO 

EEe^.'^ = EEp- (26) 

a=l u=l p=l 

The left hand side is the sum of all E,y a, and thus of all P(^„ „ , and thus 1 (for a complete 
orthogonal system xi,X2,---, the sum of all equals and the cpp^a do form a 
complete orthogonal system); the right hand side is the sum of all Ep, and thus of 
all P^^^ , and thus 1, too (also the cox,p form a complete orthogonal system) — thus, 
everything is proved. 

We thus have that the Ej,^a and s^^a with a = 1,2, u = 1, Na is just a different 
way of indexing the Ep and Sp with p = 1,2, . . .. Correspondingly, we write ux^^^a for 
^\,p- We introduce 

Sa Na 

p=l u=l 

We see that J-A^ is the mixture of the states (pi^a, ■ ■ ■ ,'^Sa,a with equal weights, or, 
alternatively, the mixture of the mixtures j^Ei^^, . . . , — Ejv„,a (considered above as 
corresponding to phase cells) with weights proportional to si^a, ■ ■ ■ , SNa,a- 

The analoga of these concepts in the Gibbsian theory are, again, obvious: -^-A^ 
corresponds to the energy surface, i.e., to the micro-canonical ensemble, A^^ is the number 
of phase cells E^, „ on the energy surface, and Sa = tr is the number of true states 
(i.e., of stationary quantum orbits) on it. 

The macroscopically possible energy measurements thus decompose the totality of 
conceivable states into the energy surfaces A^, a = 1, 2, . . .; further energy measurements 
(which would resolve the A^ into the ipp^a, P = 1, • • • , So) are not possible with these 
means. However, other measurements are macroscopically possible, and they must refer 
to quantities whose operators do not commute with H, i.e., which cannot be measured 
simultaneously with the (microscopic) energy. Classically speaking, they must refer to 
non-integrals of motion, i.e., to quantities that change with time@ These measurements 

^^From Pi^j/ + Pi^ii + . . . = (with oj', lo" , . . . pairwise orthogonal) we obtain by muhiphcation with 
P^r the equation P^r — 0, which is certainly false. 

^^By inspecting the definition of as a matrix in Section 10.31 we see that this is identical to the 
usual form of completeness relation. Cf. also [12] ■ 

^®For example, in a gas enclosed in a box K, the total energy of the molecules in the left half of K 
can be measured macroscopically with certain accuracy — but is not an integral and thus varies with 
time. 
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decompose the energy surface into the phase cells Ej^^, ^ = 1, • • • ,Na. A further 
decomposition (resolving the E,^a into the uJx^^^a, ^ = ^,---,s^^a) is macroscopically 
impossible. 

We thus have that the quantity Na is a measure of the extent to which the macro- 
scopic methods of measuring are adequate for quantities that cannot simultaneously be 
measured with energy — i.e., the extent to which the inaccuracy of macroscopic energy 
measurements is determined by the uncertainty relations. The magnitude of the Si,^a 
(i.e., of the phase cells Ej,a), on the other hand, is a measure of the inaccuracy of the 
macroscopic methods as such, i.e., as a consequence of their imperfection. The inaccu- 
racy due to Na gets compensated by observations of non-integrals; it is not a weakness 
of our measurement apparatuses, whereas the inaccuracy due to Si,,a is. Finally, 

Na 

Sa = J2^-'- (28) 
is a measure of the product of both, i.e., for the total, actual uncertainty of the energy. 



1.3 

Suppose now we are given an arbitrary state ip (where the wave function ijj is normal- 
ized, i.e., IIV^IP = = !)• The probability that macroscopic measurements on a 
system in this state will yield the values corresponding to the phase cell E^^^a is, accord- 
ing to the known rules, the sum of the transition probabilities to the eigenf unctions 
<^i,i/,a, ■ ■ ■ , ^^s.,a,u,a constituting E^,„. Thus, it is 

Y,\{i^,^X,.,a)\' = = (E.,a^,^) . (29) 

A=l A=l 

In words, this is how strongly the cell E^^a is occupied in the state ■0- Likewise, the 
probabihty of the the energy value to belong to the group {VFi^a, . . . , Wg^^} is given by 

EK^'^p-'^)!' = = ('^«^'^) • (30) 

p=i p=i 

Thus, it is the occupation number of the energy surface A^. We note that, in agreement 
with these concepts, 

Na 

J](E.,„V',V') = (A„V,V') (31) 

1^=1 

oo 

5^(A„^,^) = (^,^) = 1. (32) 

a=l 
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Now we are ready to define the micro-canonical ensemble pertaining to the state 
if) by specifying its statistical operator. If one {Aatp,'ip) were 1 and the others 0@ 
we would of course have to take the statistical operator considered already in 

Section [T2@ But if several (or all) (Aa'?/', i^) are nonzero, we define it to be the mixture 
of the J^Ai, JjA2, . . . with weights {^iip, {^2^^, "ip), ■ ■ ■■ Thus, the micro-canonical 
ensemble has the statistical operator 

U, = f;^^A,. (33) 

a=l 

Of course, this definition is really justified only afterwards by its success, i.e., by the fact 
that only with this definition, the ergodic theorem and the if-theorem hold. (Practically, 
of course, all but one (A^V','?/') are very small.) 

It remains to define the entropies of ip and (of the state and of the corresponding 
(virtual) micro-canonical ensemble) . The expressions for entropy given by the author in 
[20] are not applicable here in the way they were intended, as they were computed from 
the perspective of an observer who can carry out all measurements that are possible 
in principle — i.e., regardless of whether they are macroscopic (for example, there every 
pure state has entropy 0, only mixtures have entropies greater than 0!). If we take 
into account that the observer can measure only macroscopically then we find different 
entropy values (in fact, greater ones, as the observer is now less skilful and possibly can 
therefore extract less mechanical work from the system); nevertheless, the theory can be 
set up also in this case. How to do this has been discussed by E. Wigner0 the formulas 
for the entropies S{-ip), S{\J^) of and readjfl 

SW = - E ^) In ^^^^ > (34) 

^(U,) = -f;(A.V^,V^)lni^^. (35) 

a=l 

By the way, these entropy formulas are identical to the usual ones based on Boltzmann's 
definition of entropy (and Stirling's formula), as one sees by noting that the {E^^ai^ , ip) 
(the (Afl?/', ?/))) are the relative occupation numbers of the phase cells (of the energy 
surfaces) and the s^^a (the Sa) are the numbers of quantum orbits therein, i.e., their 
so-called a-priori weights. 

^^Note that all our "occupation numbers" are, by their nature, non-negative. 

^°In [19] , general reasons are provided for the conclusion that always this statistical operator belongs 
to that ensemble defined by requiring merely that the energy lies in the a-th group. 

'^-'^Mr. E. Wigner has communicated his hitherto unpublished results on this topic to the author orally. 
Here we shall use only those formulas necessary for the purpose at hand, while we need not enter into 
the general theory. 

^^We have omitted the usual factor k (= Boltzmann constant), and thus introduced as the unit of 
temperature "erg" per degree of freedom. [1 erg = 1 g cm^/s^ = 10~^ J] 
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2 Implementation of Proofs 



2.1 

The temporal evolution ipt of the initial state is determined by the time- dependent 
Schrodinger differential equation 

d i 

'^0 = '^, Q^^t = ^y^-ipt (36) 

with H the energy operator, 

OO Sa 

a=l p=l 

Thus, if 

OO Sa 

^ = EE^P'»^"'"''^A'^ (38) 

a=l p=l 

with a > and < ap^a < Svr then 

OO Sa 

i^t = J2Yl ^P.ae*^'^'"'^*/'^"'"'' Vp,a . (39) 
a=l p=l 

We introduce the abbreviations 

(the last two expressions are equal because 

Sa Sa Sa 

(Aa^Pu^t) = E(p^.>*'^*) = EK^^'^A-^)!' = (41) 
p=i p=i p=i 

does not depend on t.) As we see, 

N„. 



a;i.,a = Ma , (42) 



OO 

J2ua = l, (43) 

a=l 

Xj^^a depends on t, Ua does not|ff| From [our discussion above at] the definitions of 
entropies we know that the x^^aiUa are non-negative and that 

OO Na OO 

^(V^*) = -EE^^.'^1^ — ' S{\}^) = -Y^uM^ . (44) 

a=l u=l ' a=l 



^•^Thus, the micro-canonical ensemble [i.e., density matrix] = 12'^=i{'^a/ Sa)^a does not change 
when "0 is replaced with ipf 
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Since the sum of all x^^a (or of all ""a) equals 1, they all lie in [0, 1], and thus both 
entropies are always non-negative. We now discuss more closely their magnitudes. 
We note < < we replace by a variable z and assume first that 



2s 

U < 2; < Ua , I.e., 



Sa 



-Z ~ 1 



< 1. 



(45) 



Then 



-z-1 



^u,a'^a 



-Z-1 



(46) 



Sv,a'^a 



+ 



Sa 



-z-l 



-z-1 



S„ 



-z-1 



- + ... (47) 



^v,a'^a ^v,a'^a 



Sa Sa Sa 



Sa 



-Z-1 



Sa 



^v,a'^a 



1 X ISa LSj/^alia 



-Z-1 



+ 



Sii,a'^a 
2 X 35a 



.9 1 3 
Z — 1 



^u,a'^a 



Since 



1 1 



1x2 2x3 

the sum of the absolute values of the last terms is no greater than 



Sa 



Si/ n.Ufi 



(4J 



(49) 



(50) 



and we can thus write 



1 ^ 
+ 2;m 



< 



Sa 



Sv,a'l^a 



^v,a'^a 



Sa 



(51) 



In order to prove this also for the other values of we compare the left hand side 
(without I ■ • • I) with half of the right hand side. For z = Sy^a^a/ Sa they both vanish, 
and their derivatives are in general 



- (ln^ + l) + (in— + l) =ln-^^ 



(52) 



and 



Sg 



Sa 



Sg 

^v,a'^a 



-Z-1. 



(53) 
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Obviously, the former is always less than or equal to the latter and = when 



(54) 



Thus, the left hand side of flSip . while always non- negative, is = than half the right 
hand side of fl5T]) for z as in fl54|) . We thus have in general that 

s 



< 



< — —Ua In — - ( In — + 1 



Z Sri 

+ z\n < 



1 2 



Now we set z = Xy^a and sum over u = 1, . . . , Na, since 



we obtain that 



Na Na 

^ ^ Su,a Sa , ^ ^ Xy^a 



Na 



Un 



U ' X S 

< -Ua In "FT + ^u,a In — < ^ — 



^u,aUa 



(55) 



(56) 



(57) 



If we sum also over a = 1,2, 



we obtain that 

OO Na 



Sa 



o<5(u^)-^(v^o<EE^ 



a=l v=l 



Xi. 



Sa 



(5J 



This estimate provides an ansatz for proving the if-theorem. We now proceed to 
the ergodic theorem and find that it requires a bound on the same expression. 



2.2 

Let A be a macroscopically observable quantity, i.e., 

OO Na 

a=l v=l 

The ujx^u,a of the phase cell E^^a are eigenfunctions of A with eigenvalue rj^^a — i-e., r^^^a 
is the value of A in the phase cell Ej,^a. Thus, A has the following expectation values in 
the state ipt and in the micro-canonical ensemble U^: 

OO Na OO Na 

(A^j, ^t) = X] X] Vi^A^-^A^^t^ ^i) = X] Vu,aXy,a , (60) 

a=l u=l a=l v=l 

(OO Na OO Na \ 

(E E ^^.'^E...) (E E I^E.,.) (61) 
a=l u=l a=l u=l ' 

OO Na 

= EE^-^- (62) 

o=l v=\ 
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(The number of terms gets reduced by the fact that E,^ aE^ 5 = except when v = fi 
and a = 5, in which case Ej^qE^;, = E^^ has the trace u^^a-) We denote the values 
and fl^ by E/^^ipt) and Ef^{[J^). Using the Schwarz inequahty, we can estimate: 



00 Na 



a=l v=l 



^00 Na 



< 



a=l u=l 

00 Na 



Sa 



a=l u=l 



s, 



00 Na 



Sa J 
2 



Sa 



Sa 



a=l u=l 



X 



v,a 



Su,aUa 

~s:. 



1 2 



(63) 
(64) 
(65) 



The first factor we abbreviate 77^; since 



^u,a^a 



00 Na 

EE 

a=l u=l 

00 Na 



Sa 

^u,a^a 
Sa 



> 



^2 



a=l u=l 



Sa 



(66) 
(67) 

(68) 



this is a weighted average of the values 7]^^ of A^, in fact the micro-canonical average: 
after all, is the mixture of the {1/Sa)^a {a = 1,2, . . .) with weights Ua and thus that 
of the {l/s,y^a)^u,a (a = 1, 2, . . .; z/ = 1, . . . , No) with weights s^^aUa/ Sa, and has, as 
we know, the value r/^^^ in {1/ s^^a)^u,a- Thus, is a reasonable measure of the order of 
magnitude of the quantity A. We thus have that 



00 Na 



Sa 



a=l u=l 



^u,a 



Sa - 



(69) 



2.3 

Now we average over time, denoted by Mf. We thus obtain that 



M,{\S (ii^) - S{^,)\} < 



00 Na 



Sa 



a=l u=l 

00 N, 



X,. 



M,{{E^i^P,) - i?A(U^))'} < fMAJ2Yl 



Sa 



a=l u=l 



^u,a^a 



Sa 



(70) 
(71) 



Thus, ergodic theorem and if-theorem will both be established when we have shown 
that the Mt{- ■ ■ } on the right hand side is small uniformly for all initial states ip (i.e.. 
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all rp,„, ap^a with YfpLi ^l,a = II^IP = !)■ (Note that while x^^a depends on t, r^^^, 

and Q!p^a, the depend only on r^^a, and everything else is constant.) 
In order to show this we first compute 



oo St, 



(OO Of, 
b=l p=l 



oo 5;, 

f)=l p=l 

' p,a' a,a^ V i^,arp,a' ro;aJ 

p,cr=l 



(72) 



(73) 
(74) 



5a 



Thus, using Yl rt = ""a, 



p=i 



Sa 



E 

p,(T = l 



<y qf 

' p.a' o-.a"^ 



«({VKp,a-Vya,a)iM+(ap,a-aa,a)) 



;E^,a<^p,a, <^a,a) + V rj^^j (E^,a^p,a, <^p,a) " ^ f • (^5) 
p=l ^ '^'^ ^ 



If we square this expression and average it over t then all terms containing e*^* with 
c 7^ vanish. Thus, if 



forp^a: Wp-W, + ^, 
ioi p^a, p' ^a' : {Wp -W„) - {Wp, -W^,) ^ Q 



(76) 
(77) 



unless p = p' , a = a' — i.e., if for every fixed a all Wp^a (p = 1,2,...) are distinct, and so 
are all Wp^a — W^^a (p 7^ cr, cr = 1; 2, . . .) — then we obtain that 



Mt 



Sa 



Sa 



^ ^ ''^p,a''^a,a\i^i',a.fp,a, fa,a)\ 



p,a=l 
p^o- 



(Sa \ 2 



(75 



^''The number of terms gets reduced by the fact that {E^^afp.b, 'fia.c) = {'Pp,b,Eu,a'Pc7,c) = unless 
a = b — c. It suffices to show E^.afp.b — ioi a ^ b, or (because of Ei/^qAq = E^,a, see Section fOj) 
that Aa'^p,b = 0. This follows from Aq = X]ct=i '^v^.aJ since ipp^b is orthogonal to all frT,a- 
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We now set 



p,cr=l 



(79) 



^0) 



where Mu,a, ^u,a are constants, i.e., independent of t, Vp^a, C(p,a, and thus of ipt- Since 



Sa 



^ r^^ = Mq, we have that 



Mt 



Sa 



Sa 



^ ,<,M,,, + rj^^ (81) 

(82) 



p,a=l 

< n2(M,,„ + N,, 



p=i 



and thus 



OO Na 



a=l i/=l 



5a 



Sa 



OO Na Q 

<EEt^(m.. + n.,.) 

1 -, °u,a 
a=l u=l ' 



(83) 



Because of ^ = 1, this is 



a=l 



Na 



Sa 



< max y"^(M,,,+ 

a=l,2,... 



^4) 



where it suffices to take the maximum over those a for which Ua 7^ 0, i.e., whose energy 
surfaces actually occur in the micro-canonical ensemble. Thus, we will have reached our 
goal when we can prove, for these a, that 



Na 



Sa 



V^(M,,, + N,,J 



^5) 



is small; in fact, our result will then hold for all of these ifj, as the expression ( 185|) is 
constant, i.e., independent of ip (and t, Vp^a, ttp,a) — it only involves the Ei^^a (and thus 
indirectly Sa, Na, Su,a, ^a, and the ux^^^a)- In order to bound the expression (185!) . we 
need to bound the Mj^^ and Nj,a. 



2.4 

We regard H (and thus the Wp^a and ^p,a) as fixed (obeying ( 1761) and ( 1771) 1^. as well as 
the Sa, Na, Sy^a, and A^; we merely vary the Ej.^, within these boundaries. That is, we 

^^These conditions could be relaxed slightly. We could dispense with [ (l76l) . i.e.,] the distinctness of 
the Wp^a and demand the following of Wp^a — Wcr,a [instead of ([77)) ]: it be possible to partition the set 



21 



vary the orthogonal system ux^^^a = ^, ■ ■ ■ , N^, A = 1, . . . , Su^a), subject only to the 
condition 

iy=l A=l 

and set 

E.,a = E P-A,.,. (87) 

A=l 

for u = 1, . . . , Na- Note that all such orthogonal systems WA,!/,a arise from one of them, 
say uJx^u,a, by unitary transformations (in "^^Zi ^u,a = Sa dimensions since we keep a 
fixed). (Think, for example, of the definition of the P^j as matrices in Section 1(1751 ) 

Then the Mu,a and Nj, „ depend only on the ujx,u,a] not for every choice of the latter, 
in fact, they are as small as we need them to be (and no reasonable condition on Sa, 
Na, Sy^a would help with this). For example, if the ujx,v,a coincide with the ipp^a (where a 
is fixed, note that there are Sa of each), one sees that every {Ey^a^p^a, Vp,a) assumes [for 
some p] the value 1 among others, and therefore 

(provided that, as is always the case, Sy^a < \Sa for all z/), and therefore 

Na q 1 AA 

— (M,,, + N,,J > iV„ X 2 X - = ^ , (89) 

thus arbitrarily large if A^^^ is large. The unfavorable result in this case arises, of course, 
from the fact that this choice of u:x,v,a does not represent well their physical meaning: 
here, the E,^ ^ have the same eigenfunctions as H and thus commute with H — which we 
expected not to be the case (cf. Section [L2|) ! 

On the other hand, this behavior is singular and exceptional, and for the overwhelm- 
ing majority of the relevant systems ujx^u,a we find the right order of magnitude for M^, „ 
and Nj^a. But before we prove this, we would like to get an idea (in an inexact way!) 
of what to expect of M,^ ^ and 'H^^a in the best case. To this end we proceed as follows. 
Instead of averaging 



My^a = max(\{Ey^a^p^a, ^a,a)\] , (90) 

p,a=i y ' / 

^u^a = max(^^^{Ey^a<^p,a,<-Pp,a) " ^| j (91) 



of all pairs p, a with p ^ a (where p,CT = 1, . . . ,Sa) into k groups in such a way that within each group 
the Wp^a — Wa,a are pairwise distinct — if fc is a fixed number for each a and the conditions on the size 
of the Sa, Na, and s^,a that we will specify later are satisfied to a sufficient extent then our conclusion 
is not affected. That is, it does no harm if our conditions ([75]) and (I77|) are violated in few cases. We 
do not give further detail. (In particular, to drop ((76|) does not gain us much, as Wp^a = Wa-^a and 

Wp',a = Wa',a together imply that Wp^a - Wa,a = Wp'^a - W^',a-) 
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over all possible systems ujx^^^a (i-e., of determining which values are predominantly 
assumed; the definition of the averaging procedure will be explained in the appendix; 
see also the discussion in Section [XT]) , we average the 

I 1 2 

\{E^^a'^p^a,'^a,a)\ {p (T, p, (T = 1, . . . , Sa) (92) 
[^u,a^p,a,<^p,a) - (p = I, . . . , Sa) (93) 

themselves and then take the maximum. That is, we replace the mean of the maximum 
by the maximum of the mean — this leads to wrong, in fact too small (i.e., too favorable) 
numbers, but may suffice for the purpose of a first orientation. 
As will be shown in the appendix, the averages of 

\{^u,afp,a,<^a,a)\ {p ^ , {^u,aV p,a, V p,a) , \^{^u,a<^ p,a, <^ p,a) - ^ j (94) 



are equal to, respectively. 



(95) 



^aiS"^ 1) Sa S^[Sa + 1) 

and thus, if (as is the case in practice) s^^a ^ •S'^, approximately equal to, respectively, 

(96) 



^u,a ^u,a ^v,a 



C2 ' Q ' C2 ■ 

Oa iJa 

For Miy,a, ^u,a we tentatively insert s^^a/ S^, which yields 



5^^(M,. + N,J = 2 5^1 = f^. (97) 



u=l u=l " 



This is small when Na/Sa is small, i.e., when 

fl=l = (98) 

is large. That is, the s^^a (i-e., the phase cells) must be large on average. This result is 
very reasonable, and we thus proceed to considering the correct average of M,^a, Nj, a 
over the uxM,a- 



2.5 

For the average of M,^ a, N,^ a over all u}\^y^a with 

Na 

v=l A=l 
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we will find in the appendix the respective upper bounds 

In 9s^^a In Sg 

C ' C2 ■ 

iJa i->a 



(100) 



We see that they are Sa In Sa/ s^^a times (respectively 9 In Sa times) larger than the values 
used in (^71) (keep in mind 1 <^ s,y^a ^ •S'^); in particular, the first bound is much worse 
than the second. It is possible that our bounds can be improved considerably and can 
get closer to the values of the previous section — we emphasize this so that readers get 
the right picture of the conditions on the sizes of Sa, Na, and Si^^a that we will find: they 
are certainly sufficient but perhaps not necessary. 

By inserting the above expressions, we find the average of 

^^(M,,, + N.,,) (101) 

to be 

We introduce the arithmetic and the harmonic mean of the Sy^a [y = ■ ■ ■ , Na)'- 

Na ^ , , Na 



1 — \ S(i 1 1 — v 1 , , 

JVa l^a i'a 

Then the expression (11021) equals 



(ln^,)(- + ^). (104) 



Because of Sa < Sa and Na ^ 1 (which amounts to the justified assumption that the 
energy surface contains many phase cells), this is approximately equal to 

TV 

i\nSa)^. (105) 

Sa 

When is this expression small? 

Certainly we must have that > Sa 3> A^^^ and thus Insa > liaNa, so we can replace 
In S'a = InSa + In Na by InSa- Therefore, the condition is: 

(lns,)^<l or (106) 

Sa Sa m Sa 



I.e., 

N, 
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This means that the Sj^^a must be quite large when compared to their number Na (i.e., 
the phase cells must be large compared to their number on the energy surface), and not 
merely, as assumed in Section 12. 4^ large compared to unity. We will investigate later 
what exactly this means for the distribution of the s^^a- 

We emphasize again the provisional character of our estimates. It is possible that 
the above stronger assumption on the size of the phase cells is indeed necessary for 
the ergodic theorem and the if-theorem to hold. But maybe it merely arose from the 
imperfection of our methods of estimation, and in fact the condition <C 1 of Section [231 
is sufficient. It would be of interest to clarify this. 



3 Discussion of the Results 
3.1 

We sum up the results so far. We have shown: 

Let ip be an arbitrary state, tpt the state arising from ip after time t {= 0), its 
micro-canonical ensemble (see Section [T73|) . H the energy operator, Wp^a its eigenvalues 
(a = 1, 2, . . .; p = 1, . . . , Sa, only those with distinct a's can be distinguished macro- 
scopically, see Section II. 2p — both ip and H are the exact (rather than the macroscopic) 
expressions. We assume of H that (for fixed a) all Wp^a are pairwise distinct, and so are 
all Wp^a — Wa,a, P 7^ 0", i-G., that H has, within a macroscopically inseparable group of 
terms, no degeneracies and no resonances with an (imaginary) second equal systemic 
(Infrequent violations of these prohibitions can be tolerated.) Then we obtain, in the 
time average, for the expectation value of any macroscopic observable A and for the 
entropy: 

Mi{(EA(Uv,)-^A(^t))'} <f max V — (M,,, + N,,,) , (108) 
Mt{\S{\}^) - S{i,t)\] < max — (M,,, + N,,,)) . (109) 

a-1,2,... \^^^ S^^a J 

(Cf. Section 12. 3j it suffices to take the maximum over those a whose (macroscopic) 
energy surfaces occur in the micro-canonical ensemble (i.e., Ua = {^ai','ip) 7^ 0) — in 
practice this is usually just one a. ff is the micro-canonical average of and thus a 
measure of the order of magnitude of the latter.) 

The ergodic theorem and the if-theorem hold without exception (i.e., for all ip) if 

Na ^ 

— (M^,„ + N^,„) are small. (110) 

^^Namely, when Wp^a — Wa,s = W^p',a — Wcr',a then [the product of] the state (pp^a in the first system 
and the state ipa',a in the second system has the same total energy as [that of] ipp',a in the first and 
(fica in the second. 
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About the validity of this condition, which involves, apart from Sa, Na, s^^a (and A^), 
also the u!x,u,a (in the Mu,a, ^u,a), we can say this: If 




i.e., if the phase cells E,y^a are large compared to their number on an energy surface Aq, 
then (lllOp is satisfied for the overwhelming majority of the WA.z^.a — i-e., the average over 
^x,u,a of E^l, iSa/s,,a) (M,,, + N,,„) is smalllll 

The real condition flllOp for the validity of the two theorems can be violated also 
when (imp holds, i.e., also in this case the macroscopic technique of measurement (the 
i^A,i/.a) can be chosen in such a way that the two theorems do not hold. However, 
for the overwhelming majority of the macroscopic setting, both theorems hold without 
exceptions (i.e., for all ip and A). 



3.2 

Let us study fillip more carefully. If all Si,^a (for a fixed a) were of roughly equal size 
then (imp would amount to Na/sa ^ 1/lnSa or Sa/hiSa S> Na — that is, just a little 
more than the condition Na, which is the statement that the phase cells are large 
compared to their number on the energy surface. If, on the other hand, the sizes of the 
Su^a are substantially different then we need to be very cautious: already a single Si^^a that 
is not ^ 1 will have the effect that J2n(^/^'^,a) is not ^ 1, and thus that our condition 
(imp is violated. On the other hand, the s^^a are very different from one another, as 
Ins^^a is to be understood as the entropy of the mixture {l/s^^a)Eu,a characterizing a 
general system in the phase cell E,. J^^l— and it suffices to recall the situation in the 
theory of gases to appreciate that one energy surface will usually contain phase cells 
with very different entropies. (This fact makes the if-theorem a relevant statement.) If 
the greatest difference in (macroscopically perceptible) entropy among the cells is cr, so 
that always 

|lnsj.,a -lns^,a| < cr , (112) 

then 
and 



> Sae-"" (113) 



which leads us to the condition 



E— (114) 



^- > e'^Na . (115) 



In s 



^^Note: what we have shown is not that for every given V' or A the ergodic theorem and the H- 
theorem hold for most LUx^u,a but that for most WA.i/,a they are universally valid, i.e., for all ip and A. 
The latter is, of course, much more [i.e., much stronger] than the former. 

^^This follows from our considerations above or, alternatively, from Boltzmann's definition of entropy, 
as the phase cell 
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This relation shows that no danger arises: since the smallness of h affects the left 
hand side (because — )■ cxd as — )■ 0, see Section IU.6p but not the right, flllSp will 
normally be satisfied. We believe that further discussion is not necessary. 



It remains to discuss the significance of the conditions (1761) and (1771) on the eigenvalues 
of H by exemplifying them using the known classical examples and counterexamples to 
the ergodic theorem and the if-theorem. 

Let K he a. box in which N corpuscles ki, . . . ,kN move around, i.e., a gas; we make 
one of the following two assumptions: either 

a) that there is no interaction between the particles, not even collisions (i.e., that 
they pass through each other); or 

/3) that there are interaction and collisions. 

In case a, it is known that the two theorems do not hold (as any distribution of 
speeds, not just the Maxwellian, persists for an arbitrarily long time); in case /3, in 
contrast, one expects the theorems to hold. (The situation is completely analogous for 
radiation in a cavity with reflecting walls.) How can this behavior be understood from 
the perspective of our conditions? 

Since the Sa, Na, s^^a, and E^^a are hardly affected by the difference between a and 
(3, the condition on H must be relevant. Let us first consider each particle on its own 
in K, and let its energy eigenvalues be £i, £2, • • -S Then, the energy eigenvalues of the 
total system in K are, in case a, the expressions of the form 



with Zi, = 0,1,... and J2T=i = N, while in case (3 they are slightly modified — the 
less so the weaker the interaction is. The identity of the particles would lead in general 
to an iV!-fold "permutation degeneracy, "@ and thus to a violation of the first condition 
( 1761) on the energy eigenvalues, but since either Fermi-Dirac or Bose-Einstein statistics 
apply, i.e., since only wave functions that are anti-symmetric respectively symmetric 
are admissible [HI E], these degeneracies disappearo Thus, no such difficulty arises. 

^^We assume that [the particles] ki, . . . ,kN are identical and in principle indistinguishable. If they are 
distinguishable then every [particle] kn {n — 1, . . . , N) possesses a different term spectrum e„i, £„2, • • •• 
The situation is similar to the one we are describing, except that the danger of degeneracy vanishes; a 
still conflicts with the second condition (|77|) on the eigenvalues of H while /3 does not. 

^"In the case a. In the case /3, the degrees of degeneracy are the degrees of the irreducible represen- 
tations of the symmetric group of N elements. Cf. [231 [23 12S] • 

'^^In the case of Fermi-Dirac statistics, only = 0, 1 are admissible, but this does not affect our 
considerations. 



3.3 



00 
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However, in the case a numerous relations of the type excluded by the second condition 
dZI]) hold: 

{ei + 63 + ...)- {62 + + ...) = {ei + 64 + ■■■)- {62 + 64 + .. .) etc. (117) 

In the case /3 this does not happen because the four above terms of K will be perturbed 
in very different ways, and, obviously, the absolute magnitude of the perturbation (i.e., 
of the interaction) does not matter. 

Thus, it is the behavior with respect to the condition (1771) that constitutes the reason 
for the different character of a and /3. 

A Appendix 
A.l 

The properties used in Sections 12.41 and [2?5l of the distributions of 

I 1 2 

\{^u,aVp,a,'^a,a)\ (P 7^ O") and (E^^^V^p.a, V'p.a) (US) 

need to be established. But first we need to explain the sense in which we speak of a 
statistical distribution. 

As we have pointed out in Section 12.41 everything that depends on E^^^ ultimately 
depends on the 0J\^u,a, and the average we have in mind is the average over these Ci;A,i/,a- 
Since Sa, Na, s^^a and are given, they are bound to the condition 

EEP-w = '^'^ (119) 

u=l A=l 

and determine, in turn, the ^u,a according to 

p.,,.,. = E,,, . (120) 

A=l 

We have also mentioned that all such [orthonormal] systems can be obtained from one of 
them, say uj\^u,a by unitary-linear transformations. Thus, if we choose cJA,y,a in whichever 
way, we can equivalently say that we average over the set of the unitary matrices in 
Ylvli^y^a = Sa dimensions; they map the uJx^u,a to the u\^u,a (a is fixed!). We should 
denote these matrices by {C,x^u\yy}, using for their rows a double index A, z/ and likewise 
A', z/' for their columns, corresponding to the notation UJx^u,a and oJ\^u,a and the relation 

Na Sv,a 

^\,v,a = ^ ^ ^X,u\X'y (^\'y,a ■ (121) 
A'=l i/'=l 

We prefer, however, to introduce for them the notation ^p|p/ {p, p' = 1, . . . , Sa). Now 
we need to explain how to average over the set of the dimensional unitary matrices 

{^pIp'}- 
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We wish to average in a way that does not prefer any reference frame ujx,u,a to the 
others. If uJx^u,a is another such reference frame and 

(^X,u,a = ^ ^ ix,u\X',u'(^X',u',a , (122) 
A'=l u'=l 

(we also rewrite ^x,u\x'y as C,p\p') then the matrices {^p\p'} and {(,'p\pi} that represent the 
[orthonormal] system uj\ya relative to cJa !/ a respectively cJ\ua are related according to 

Sa 

^'p\p" = ^^p\p'^p'\p" ■ (123) 

p'=l 

Thus, the procedure of averaging must be invariant under transformations of the above 
form {^pIp'} {^p|p'} (^^^ every fixed unitary matrix {Cp\p'}) [i.e., under right multipli- 
cation]. Such a procedure of averaging over the unitary group does exist, is uniquely 
determined by the above requirement, [amounts to integration relative to a measure now 
known as the Haar measure on the unitary group] and has been specified by Weyl |22] ■ 
His general formulas we will not need, as we can reach our goals just by means of the in- 
variance properties of this averaging procedure. We mention that (as shown in [22]) this 
averaging procedure is also invariant under [left multiplication, i.e.,] the transformation 
{^pIp'} ^ kp\p'} defined by the relation {^'^^^,} = {ip\p'}{^p\p'} [i.e., ^" = |^], i.e., 

Sa 

C\P" = ^^p\p'^p'\p" ■ (^24) 
p'=i 

Second, for our calculations we simplify the notation. Since the order of the u = 
1,. . . ,Na is without significance, it suffices to consider Ei,j. When replacing the two 
indices A, u by one index p we can arrange that (A, 1) corresponds to p = 1, . . . , Si^^. 
Furthermore, we select the reference frame U\^u,a- let it be the system of the ipp^a (where 
we have also replaced the indices). We thus have that 

Sl,a 

) (125) 

T=l 

Sl,a SI, a 

= ^{Vp,a,(^T,a){(^T,a,^a,a) = ^C,p^T,a- (126) 
r=l r=l 

Finally, we omit the unnecessary indices i^a, so that Sa, Na, Sia, A^, Ei q, ipp^a, ^i,a, ^i,a 
will be written as S, N, s. A, E, v?p, M, N@ 

^^Note of the translator: Note the difference between N and N: = Na is the number of macro- 
states, N — a is one of the error bounds. 
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Our task is now: As {■Cp.p'} runs through all S'-dimensional unitary matrices, inves- 
tigate the distributions, with respect to the [measure corresponding to the] averaging 
procedure sketched above, of 



|(E(^„¥,.)|'=|J]CA- (P^^) (127) 

T=l 

and 

s 

(E^„^,) = ^|e.,pr- (128) 

T=l 

A.2 

We begin with an auxiliary reasoning. We determine the distribution of the values of 

s 

as the vector {xi, . . . ,xs} runs through the unit sphere 

5 

p=i 

at first with real Xp. That is, we determine W{u), where W{u)du is the (geometric) 
probability for 

s 

u<^xl<u + du (131) 
p=i 

{0 < u < 1)@ Simple geometrical considerations that we need not reproduce here show 
that W{u) is proportional to 

^^/2-l(l_^){S-s)/2-l^ (132) 

where the proportionality factor needs to be determined from 

[ W{u)du = l. (133) 
Jo 

Now, if we allow xi, . . . , to be complex and consider 

s 

u <^\xp\'^ <u + du (134) 



'^•^This amounts to determining the surface area of the s-dimensional calotte on the S'-dimensional 
unit sphere. 
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instead of (11311) and 

s 

J2\xp\'^ = l (135) 
p=i 

instead of (I130p , then we realize that the problem has not changed as we can regard the 
real and imaginary parts of the Xp as real Cartesian coordinates. Thus, we only need to 
replace s, S by 2s, 2S, so W{u) becomes proportional to 



u 



'\l-uf-''\ (136) 



and the proportionality factor can be determined from the normalization condition to 
be 

(s-iy. 



{s - 1)\{S - s - 1)\ 



(137) 



Therefore, 



the average of kppj 
p=i 



[s - 1)\{S - S - 1)\ 

(5-1)! 



{s - i)\{s - s - ly. 

(5-1)! {s + n + iy.{S -s-iy. 



{s-iy{s-s-iy {s + n-iy 

s(s + 1) ■ ■ ■ (s + n - 1) 



u'-\l-uy-'-^u''du (138) 
u'+"-\l-uf-'-^du (139) 

(140) 



SiS+l)---{S + n-l)' 

A.3 

We return to the unitary matrix ^p|p/ and introduce the abbreviation 



(141) 



T = l 



For the reasons described in Appendix lA.ll all Cp^a (p 7^ cr) have the same probability 
distribution, and likewise all e^^pH 
In 



Y.\^rj\ (143) 

r=l 

only the p-th column of {^p\p'} appears, over which can be averaged in the same way as 
we averaged over the unit sphere in Appendix IA.2I [i.e., whose distribution is uniform 



'*'*The interchange of columns and that of rows belongs to the transformations there [under which 
the Haar measure is invariant]. 
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on the unit sphere] (this follows easily from the invariance properties of the averaging 
procedure). Thus (denoting the average by Tl), 



2Jl(epp) = |, '^(4p) = J^^y (144) 



s{s + l) s{S-s) 



97t ( ( e,, - ^ j j = m{el) - fm{e^,) + ^ (145) 

(146) 



5(5+1) 52 52(5 + 1) 
Furthermore, = E implies 



J2m' = 4p+Y.m'- (147) 



a=l (7=1 



Due to the equality of the 9Jl(|epo.p) (p 7^ cr), we have that 

mepA') = g^i'^iepp) - melp)) (148) 



1 f s s{s + 1) \ s(5 — s 



5-1V5 5(5+1)7 5(52-1 



(149) 



The averages used in Section 12.41 have thus been determined in agreement with the 
values used there. 

Now we turn to investigating the distributions of 

/ s \ 2 

\epa\^{p^cr) and [^pp~^) (150) 
in order to determine the averages of M and N as in Section 12.51 

A.4 

The latter problem is the easier one. We know already that u < Cpp < u + du (with 
< n < 1) has probability W{u)du (see Appendix IA.2p . Let a be a positive number 
with a ^ s'^/S'^; then the probability of 

{epp - s/Sf > a (151) 

(note that the left hand side is certainly less than or equal to 1, as < Cpp < 1) is 

s/S-y/a 1 

\W{u) du 



s/S+y^ 
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s/S-^/a 1 

(^-1)! 



[s - 1)\{S - s - l)\ 



+ j ]u'-\l-uf-'-^du. (152) 

' 



The derivative of the logarithm of the integrand equals 
s - 1 S - s-l 1 



u 1 — u uil — u] 



([s-l]-[5-2]n), (153) 



so the integrand increases when u approaches (s — 1)/ {S — 2) from either side. This 
point lies to the left of s/S^ in fact by an amount oflfj 

s _s-l _ S-2s 1 

'S~J^~ 5(5-2) -5' ^ ' 

and thus still lies in the interval s/S* ± y/a provided a > 1/ S"^. Therefore, within the 
domain of integration, the integrand assumes its maximum at m = s/S±y/a (we will not 
try to find out at which of the two values). We can thus estimate the entire expression 
f ll52p as being 

^ (..-i)f(;-!-i)! (l^-^)'''(^-i^^)'"'''- (^^^' 

Now we use the assumption 1 <^ s ^ S\ which implies that the first factor is, by 
Stirling's formula, approximately equal tccj 

1 /~S~ / S \ / S \ 

l-o) , (156) 



eV27rV57 V S 
while the second is approximately equal to 



^(^±^y(i-i,^^Y'\ (157) 



s\S W V S 
The entire expression f ll55p is therefore approximately equal to 

^1±-V^) fl^-^v^) (158) 



eV^irs V s / \ S 
S 



3^2 



ns 



sln^l ± —y/a^ + (S - s) In^l =F — y/a 



(159) 
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Note of the translator: In the German original, Eq. (|154p is misprinted as 



s-l 5-25 1 
< — , 



S S-l S{S-1) - s 

^^Note of the translator: In the German original, the second exponent in this expression is misprinted 
as 5' — s, and the factor 1/e = exp(— 1), which is as irrelevant as the y/2TT to the purpose at hand, is 
missing here and in the following. 
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The exponent is [because ln(l + x) < x — x^/2 + x^/3 and ln(l + x) < x] less than or 
equal to 

S\/a S'^a S'^aJa S r- , s 

^s^-s—± T (S - s)^^V-a (160) 

S^a^SWa^ (161) 



2s 3s2 

Since s^faj 5^1, the second term is small compared to the first, and thus the expression 
(USSI) is 

< ^e-«^ (162) 

(B some number less than 1). 

This concerned the probability of (cpp — s/S^ > a for a fixed p = 1, . . . , S; the 
probability that this event occurs for some p, i.e., the probability of 

N = max(epp — — ) >a, (163) 

is at most S times larger, and thus 

< ^e-«^ (164) 

Now we estimate the average of N in two parts: for values in [0, a], the probability is at 
most 1, for values in [a, 1] we have the above bound. Therefore, 

9Jt(N) <a+ e~Q^ . (165) 

ev27rs 

Here, a can be chosen to be any number such that a > l/S"^ and a <^ s'^/S'^; we choose 

8s In 5 



(166) 



(This satisfies everything, provided s ^ In 5, which must be the case anyway by condi- 
tion f ll07p F^ Our upper bound thus becomes 

8slnS 4in5 ^ 8^ , 1 8s\nS 

Thus, if the premise 1 ^ s ^ S* is satisfied to a sufficient extent, the above average is 
certainly less than or equal to dslnS/S"^. 



^'^From J2u=i ^hv,a ^ l/lnsa follows s^^a ^ InSa. Put differently, see (|105p . TValnS'a/sa <C 1, [or, 
equivalently,] Sa^^Saj (saSa) ^ 1 so a fortiori Sals\ < 1, Sa > InSa > ^InS'^. Thus, we have 

that Si,a ^ In i.e., s ^ ln5. 
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A. 5 



It remains to discuss the distribution of [cpo-P (p 7^ cr). We denote the p-th and the 
cr-th column of {^r|r'} by ^ = {^i|p, . . .,^s\p} and r] = . . . ,^s|<7}; in addition, let 

i = {6|p, • • • , 6|p, 0, . . . , 0}. For such vectors C = {Ci, • • • , C5}, X = {Xi, • • • , Xs} we will 
also use the notation 



T=l 



We have that 



-per I 



(168) 



(169) 



where the vectors ^, rj, being columns of a unitary matrix, are subject to the conditions 
|,^| = 1, \ri\ = 1, {C,,ri) = (i.e., both lie on the unit sphere and are orthogonal to each 
other) . 

We decompose ^ into a component parallel to ^ and one orthogonal to ^: 



e = (e,oe + e. 



Then we can just as well write 



-"per I 



\ii,v)\ ■ 



(170) 



(171) 



When keeping ^ (and ^, ^) fixed, we thus have two vectors ^, 77 orthogonal to ^, of which 
the first is fixed and the second can vary freely on the surface of a (S* — l)-dimensional 
unit ball. We introduce an arbitrary {S — l)-dimensional Cartesian coordinate system 
for this [subspace], let 

V = iyi,...,ys-i). (172) 

From the unitary invariance of our averaging procedure follows that the procedure 
amounts (for fixed ^ = {^i\p, ■ ■ ■ , ^s\p}) exactly to averaging r] over the (S*— 2)-dimensional 
unit spherj^ as described in Appendix IA.2I Moreover, due to the unitary invariance, 

the only thing that matters about ^ is its length |^|, so we can replace it by 

|={|||,0,...,0} (173) 
(in S* — 1 dimensions) . That is why we first aim at determining the distribution of 



\{iv)\ =\^\'\yi\ 



for [random i] with] 



5-1 



(174) 



(175) 



7r=l 



■^^Note of the translator: The German original literally says here: over the {S — l)-diniensional unit 
ball. 
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That ( 1174p lies in [u,u + du] {0 < u < means that 



which has probabihty 



— <|2/ir< — + — , (176) 

^m^, (177) 

where W is given by f ll36p with s, S replaced by I, S — 1. Thus, the coefficient of du 

S — 2 /,r,o nS-S 



(178) 

|^|2(5-2) 

While we had kept ^ fixed up to now, we will now average (11781) (of course, in the sense 
of Appendix IA.2I [i.e., using to a uniform distribution of ^]) over the (S'-dimensional) 
unit sphere. The expression for the distribution of jepo-p for given ^ depends only on 

I^P, and (since ^ is orthogonal to both ^ — ^ and ^ = ^ — (^, 00 have that 

|eT = (l,0 = (e,0, (179) 

io'=i(e,mr+ie>=ieT+iip, (180) 

lip = leT(l - le?) . (181) 
Since ^ = {^i\p, • • • , Csip} varies on the unit sphere, the event 

w<\^\^<w + dw (182) 

{0<w < 1), i.e., 

s 

w < ^I^^IpP < u; + ciw, (183) 



r=l 

has probability 

-w'-\l - wf-'-Uw . (184) 



{s - iy.{s - s - ly. 



In order to obtain the total probability density of |epo-| '^i we thus need to integrate 



(.-l)!(5-s-l) 

X -^^{w{\ — w) — uY ^ dw 

{w{l — w)) 



{s - 1)!(5 - s - 1)! w^-'-^l - wy-^ 



■^^Note of the translator: In the German original, the formula corresponding to (|178l) has S— 1 instead 
of 5* — 2 and S—2 instead of S* — 3. This mistake propagates through all further formulas in the German 
original but does not affect the final result. Here and in the following, we give the correct exponents. 
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over those w e [0, 1] with u < w{l — w). As a consequence, only values in [0, |] can 
arise for u. We now determine the probability of \ep„\'^ > a (with < a < i), and to 
this end we need to integrate (I185p over those u,w with a < u < w{l — w), i.e., over 



those M, w with 



l-a<w<l 



J — a, a<u<w{l — w 

We can carry out the integration over mJH 

5 + \^3~<^ w{l — w) 



:i86) 



2 V 4 
1 I /T" 



(g-l)!(g-2) 
(s-l)!(5-s-l)! 

{s - 1)\{S - s - 1)\ 
We decompose the integral into two parts. 



{w{l — w) — u) 



S-3 



s-1 



-(i-u dw 



{w{l — w) — aj 
(l-w) 



5-2 



S-s-1 



-dw . 



;i87) 



1— /i 

2 V 4" 



1 _ /I 

2 V 4" 



/ 



and 



and introduce the new variable x according to 



^ + — x = w, respectively | 

i have that x = w{l 
Combining both integrals, we arrive at 



X = w . 



In both cases we have that x = w{l — w), and in both cases x runs from a to ^. 



{s-ms-s-1] 



X — a 



,3-2 



'(s-l) 



dx 



:i89) 



Finally, we introduce the new variable 



y 



X — a 
i -a ' 



(190) 



^°Note of the translator: In the German original, (|187p contains an inconsistency (the numerators 
of the integrands in the left and right hand sides have equal exponents) that partly compensates the 
mistake about exponents in (|178p . 
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which runs from to 1. The above expression then becomep'^^' 

(l-4a)^-^-i(g-l)! 
2^-2(s - - 1)! 



X 



(1 + VT^VT^)-'^'-'-'\i - v/r^v^r^)-(^-^)+ 



dy 



Once we divide this probabihty by (1 — Aa)^~'^~^ , only the square bracket depends on a. 
As we will show, the square bracket increases as a — 0, and thus so does the quotient 
[i.e., ([ini])/(l -4a)^-2-i]. Since for a = 0, is 1, as well as (1 -4a)^-2-| = 1, this 

implies that the quotient is always less than or equal to 1, and thus 

f lT9B < (1 - 4a)^-2-5 < e-4a(s-2-i) _ ^^g2) 



As a — )■ 0, y/1 — 4a-\/l — y tends, monotonically increasingly, to i/l — y, so it suffices 
to show that 

[(1 + t)-^^'''^\l - t)-(^-i) + (1 - t)-^^-'-^\l + t)-(^-i)] (193) 
is an increasing function of t if t > [and t < 1]. Indeed, its derivative 



(1 - + t)-(-^) {^j^ - ^) (194) 



1 + 

is positive if (we set z = > 1) [as we see by muhiplying ffTMl) by (1 + t)'^+i > 0] 

z'+^ {{s -l)z-{S-s- 1)) + z'^-'-\{S - s - 1)2 - (s - 1)) > , (195) 

but this expression is obviously greater than 

z'+^{{s-l)-{S-s-l))+z^-'-\{S-s-l)~{s-l)) = {z^-'-^-z'+'){S-2s) > (196) 

[because z > 1 and S > 2s + 2]. Thus, we have verified the above bound for the 
probability of |epo-P > a for a fixed pair p ^ a, p,(T = 1, . . . , S. The probability that 
this occurs for any such p, a, i.e., the probability of 

M = mix(|ep^|2) > a, (197) 

p,(T=l 



^^Note of the translator: In the German original, a factor (1 — 4a) is missing here and in the 
following equations. This mistake does not affect the final result. 
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is larger by at most a factor of S{S — l)/2 (because of Cpo- = e*^ it suffices to consider 
p < a), and thus is less than or equal to 

The average of M we estimate again in two parts: for values in [0, a] the probability is 
certainly < 1, for values in [a, ^] we have the above bound. Therefore: 

9Jt(M) < a + ^(^-l) g-4a(5-2-l) _ ^^gg) 



For a we can choose any number > 0, ^ 1, we set 

3lnS 



4 S 

(This fulfills all requirements because of 5 ^ 1.) Our upper bound thus becomes 



(200) 



31ng g(^-l) 3in5£z£zi 31n5 3i„^^31n5 1 31n^ 

4^ 8 "^4^ 8 A S 8S A S ' ^ ' 

Thus, if the premise S" ^ 1 is satisfied to a sufficient extent then the above average is 
less than or equal to \nS/S. 

This completes the proof of the desired estimates. 
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